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(54) Abstract Title 

Aiftomated mixing of musical tracks 

(57) For the automated mixing of two tracks of music to effect a cross-fade, variations in combined output 
volume are reduced by analysing either the intrinsic amplitude at which each track was mastered or the output 
amplitude (subsequent to amplification) and modifying either the intrinsic amplitude or the amplification 
during the transition from one track to the other. The intrinsic amplitude may be analysed by detecting, e.g. 
using peak detectors 120-124, the amplitudes and timings of peaks in several frequency bands (channels 
Ch VCh3). Musical clashes during mixing can be avoided by analysing the intrinsic amplitudes of the two 
tracks at similar frequencies to detect the likelihood of a clash, and in the event a clash is detected, reducing 
the output of one of the tracks at the relevant frequency. 
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AUTOMATED COMPILATION OF MUSIC 



2378873 



The present invention relates to the automated compilation of pieces of musical 
content, usually referred to as "tracks", and more particularly, to compilation in which 
5 one track is phased in over the top of another, preferably in a manner providing an 
apparently seamless transition between tracks. This is known in current vernacular as 
"mixing**. 

Our co-pending UK application (HP docket 30001926) discloses, inter alia, a system 
and method for the automated compilation of tracks which are typically stored as 
digital audio, such as on compact disc. In this system, the outputs of two digital audio 
players are fed to an output, such as a set of speakers. The speed at which tracks from 
the two CD players are played is adjusted, so that the beat of an incoming track is 
matched to the speed of a track cunently playing (known as "time stretching"), and 
once this has been achieved an automated cross-fading device reduces the output 
volume of the cmrent track while increasing the output volume of the incoming track, 
thereby to provide a seamless transition between them. 

A first aspect of the present invention addresses the issue of ampUfication of each of 
20 the tracks during the transition phase from one track to another, or "cross-fade". In an 
automated system, in order to try to provide a seamless transition between tracks, 
amplification of the outgoing track will typically be reduced at the same rate as the 
amplification of the incoming track is increased, with the reduction and increase in 
amplification starting at the same time. Frequently tracks are mixed so that the 
25 incoming track is faded in over the end of the outgoing track, as a result of which the 
volume on the outgoing track may well be reducing, since many dance tracks end 
simply by fading out the volume to zero, or start by fading in the volume firom zero 
(i.e. the intrinsic amplitude or "mastered volume" of the recording is reduced to zero, 
or increased fix>m zero, as the case may be). In such a situation, unless the fade-out 
30 rate of the intrinsic amplitude (and thus for a constant level of amplification, tfie 
volume) at the end of the outgomg track matches the fade-in rate of the intrinsic 
amplitude at the beginning of the incoming track, and both are in turn matched with 
the rate of cross-fading the amplification from one track to another, the transition 

I 
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between the tracks will be subject to a variation in volume which is undesirable, since 
it disturbs the seamless transition between incoming and outgoing tracks. 

Accordingly, a first aspect of the present invention provides a method for the 
5 automated mixing of at least two pieces of musical content comprising the steps of: 

selecting first and second sections of first and second tracks respectively, over 
which transition between playing the first and second tracks will be made; 

sampling intrinsic recorded amplitude of the first and second tracks over the 
first and second sections respectively^ 
10 simultaneously playing the first and second sections of the first and second 

tracks; 

effecting transition from playing the first track to playing the second track by 
reducing output volume of the first track over duration of the first section and 
increasing output volume of the second track over duration of the second section; and 
1 5 using sampUng of the intrinsic amplitude of at least one of the first and second 

tracks to equalise variations in net output volume from the first and second tracks over 
the duration of the transition. 

Equalisation of variations in recorded amplitude may result merely in a reduction in 
20 variations of net output volume in comparison to what would otherwise be the case, or 
may result in a substantially constant net output volume, depending upon the extent of 
equalisation. Equalisation may be achieved typically either by altering the 
amplification of one or both tracks over the course of the transition, altering the 
intrinsic recorded amplitude of one or both tracks, or a combination of both 
25 techniques. 

In one embodiment of equalisation by regulation of amplification for one or both of 
the tracks, a series of synchronous intrinsic ampUtude values are sampled fix>m each 
of the tracks, and contemporaneous values are then sxunmed to determine the extent, 
30 if any, to which the combined intrinsic amplitude varies over the transition phase. 

The resultant variation in intrinsic amplitude is then used to generate an amplification 
profile which is then applied proportionally to one or both the tracks during the 
transition to equalise the net output volume. Equalisation by modification of intrinsic 
amplitude may use the contemporaneous siunmed amplitude values to generate 
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discrete error values by which summed amplitude should be altered in order to 
maintain a constant value over the transition phase. 

In an alternative embodiment amplification or intrinsic amplitude modification is used 
5 to configure predetermined sections of tracks to predetermined introduction and play- 
out template profiles of amplitude against time, so that any two tracks conforming to 
the profile (either by variation in amplification or intrinsic amplitude) may be mixed 
together. 

10 In yet a further embodiment an indication of variation in combined amplitude is 

generated for a pluraUty of temporal juxtapositions of two tracks, and the temporal 
juxtaposition having the lowest indicated variation is selected. 

Typically, the equalisation will be performed on the basis of the sampling of the 
1 5 intrinsic amplitude in a particular firequency range determined as dominant, and this 
will in turn typically be determined on the basis of the firequency of the beat used for 
time stretching the incoming track and outgoing tracks. 

A second and independent aspect of the present invention is concerned with the 
20 musical elements present in the outgoing and incoming tracks, such as vocal lines, 
melodic instrument parts, or percussion signatures (fi-om, e.g. snare drums, symbols 
or handclaps etc.). It is not xmusual for such elements in the outgoing and incoming 
tracks to clash, even though the fundamental beats of the two tracks have been 
matched, and the volume of the two tracks has been equalised over the cross fade. 
25 The result of such a clash is that when these elements are heard together the result is 
an unappealing mix. 

Accordingly, a second aspect of the present invention provides a method for 
automated mixing of first and second music tracks comprising the stqps of: 
30 selecting first and second sections of the first and second tracks respectively, 

over which a transition between the first and second tracks will occur; 

for at least selected intrinsic peak amphtudes of the first track, determining, in 
accordance with at least one predetermined criterion, whether a musical clash exists 
with an intrinsic peak amplitude from the second track; and 

1 
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in the event of a clash, reducing output amplitude of at least one of the tracks 
at least at a frequency of one of the clashing intrinsic peak amplitudes, and over a 
time interval at least equal to duration of the aforesaid one of the intrinsic peak 
amplitudes. 

5 

The reduction in output amplitude (which will typically also be a reduction in output 
volume) of a given frequency band may again, as with the first aspect of the present 
invention, be implemented either via adjustment of amplification over at least the 
frequency of one of the clashing peak amplitudes (although this is only possible 
10 where the system provides for differing amplification levels for different frequency 
bands), or by copying at least the section of the track in question into addressable 
memory, and altering the intrinsic recorded amplitude levels for that frequency band. 

Yet a further independent aspect of the present invention provides a method of mixing 
1 5 first and second tracks including the steps of: 

analysing variations in amplitude with time and frequency for both tracks; 

on the basis of the analysis, defining at least one frequency band common to 
both tracks; and 

equalising output amplitude of the tracks in the frequency band during mixing 
20 from one track to another. 

Thus the frequency band to be used in order to provide equalisation is defined on the 
basis of the musical characteristics of the tracks to be mixed, rather than using 
predetermined frequCTcy bands which may not be appropriate having regard to the 
25 frequencies of the two tracks to be mixed. 

Embodiments of the invention wall now be described, by way of example, and with 
reference to the accompanying drawings, in which: 

30 Fig. 1 is a schematic illustration of a mixing system for the compilation of music; 

Fig, 2 is a graph of amplitude against time showing the mixing process between two. 
tracks; 
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Fig. 3 is a further larger scale graph of amplitude against time which additionally 
shows frequency information; 

Fig. 4 is a schematic representation of a part of a mixing system according to an 
5 embodiment of the present invention; 

Figs. 5A and B are graphs of variation in peak ampUtude at different frequency bands 
of two tracks which are to be mixed; 

1 0 Figs. 6A to C are graphs illustrating a first type of processing of peak ampUtude 
values for the puipose of equaUsing the net output volume; 

Figs. 7A to C are graphs showing generic intrinsic amplitude templates for the start 
and end of a track; 

15 

Figs 8A to D are graphs showmg a further type of processing of peak amplitude 
values for the puipose of equalising the net output volume; 

Figs. 9A and B are gr^hs showing 3-dimensional mapping of amplitude against 
20 frequency and time for two mixed tracks; and 

Fig. 10 is an illustration of a manner in which clashes of frequency between mixed 
tracks may be avoided. 

25 Referring now to Fig. 1 , a system for mixing musical tracks includes a pair of audio 
players 10 and 20, which derive an audio signal (i.e. a signal which is ampUfiable into 
sound) from audio sources ASl, AS2 respectively. In the case of manual mixing 
systems, audio players 10, 20 are typically turntables for playing vmyl records; this 
apparently anachronistic equipment being the equipment of choice for the majority of 

30 professional disc jockeys because it provides frmctionaUty not readily available with 
other formats of audio source material such as compact discs, hi the present 
automated example the audio players 10, 20 are compact disc players which derive an 
audio signal from audio data (i.e. data from which an audio signal may be derived, but 
which is not directly ampUfiable into sound) stored on audio sources in the form of 
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CDs. The present invention may however be implemented using any format of audio 
player and source, provided that in the case of analogue players, where data 
processing is required, conversion to digital data is perfomied on the output of the 
audio players. The output of the audio players 10, 20 is passed through variable gain 
5 amplifiers 30, 40 respectively, whose outputs are then passed via a mixex 50 to a 
single set of loud speakers 60 (although individual sets of speakers may be provided 
for each of the amplifiers 30, 40 if desired). In a modification, the gain controls of the 
two variable gain amplifiers are Unked, giving output into a single power amplifier; 
this gain-Unking mechanism is known as a cross fader and is frequently used by 
10 professional DJs. The illustrated system is however preferred because of the 

additional flexibility which it offers. Additionally, a processor 70 is connected to the 
outputs of the audio players 1 0, 20, as well as the inputs of the amplifiers 30, 40, and 
the processor 70 is connected directly to a random access memory 80. 

15 The illustrated system is operable to decrease or "fade ouf ' the output volume (i.e. the 
amplitude of the output audio signal, which in this example is made manifest by the 
speakers 60) of one track from one of the audio sources, e.g. audio source 1 , while 
simultaneously increasing or "fading in" the output volume from another track of 
audio source 2; ideally this is done in a maimer providing a seamless mix between the 

20 outgoing and incoming tracks. The provision of such a seamless mix first of all 

requires that the beats of the outgoing and incoming tracks are matched. This is done 
by automatically regulating the speed at which one or both of the respective tracks are 
played, and synchronising the beats of the tracks. The automation of such a process is 
described in our co-pending Eurojpean application (HP docket 30001926). 

25 Additionally, the output volume of each of the tracks must be regulated to ensure that 
there are no dramatic increases or decreases in net output volume (i.e. the combined 
output volume of the tracks playing on audio players 10 and 20) during the course of 
the transition fi^m the outgoing track to the incoming track. 

30 Referring now to Fig. 2, a graph of intrinsic recorded amplitude against time is 

illustrated for two tracks Zi and Z2 which are to be mixed, in this example the tracks 
are stored on audio source materials 1 and 2. The intrinsic recorded amplitude is the 
anq)litude of the audio signal stored (in the form of audio data) on the audio sotu^ce 
material, so that if the audio signal derived from the audio data were amplified at a 
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constant level throughout its duration, the result would be a corresponding 
progression of output volume with time. In other woids, the intrinsic recorded 
amplitude of a track may be thought of as corresponding to the volume at which the 
track was mastered in a studio, and is shown here over the duration of a time period 
5 Tx/f in which a transition, or cross fede from track Zi to Z2 is to be madeu From the 
graph it can be seen that the intrinsic amplitude of Z| drops off relatively suddenly, 
meaning that if the track is amplified at a constant level during the transition, the 
output volume of the track will drop correspondingly suddenly. By contrast, the 
intrinsic amphtude of track Z2 rises more steadily over the course of the time period 
1 0 Tx/f. To provide a seamless transition, the net output volume (i.e. the combined output 
volume of the two tracks) over the course of the transition should ideally be 
substantially constant. In the present illustrated example, if both tracks Zl and Z2 are 
amplified at the same constant level over the course of the transition, the net output 
volume will correspond to the sum of their intrinsic amplitudes, shown by the dashed 
1 5 line L, which as can readily be seen is far fiwm constant. To equalise the net output 
volume, and preferably to make it substantially constant, it is therefore necessary to 
adjust either the intrinsic amplitude or the amplification level of at least one, and 
possibly both of the tracks over the course of the transition phase. According to one 
aspect of the present invention, equalisation is achieved by analysing at least a part of 
20 each of the tracks (in advance of playing the track) over the duration of the transition 
phase between one track and another, and using the analysis to equahse the net output 
volume when the track is played. 

Referring now to Fig. 3, variations in the intrinsic amplitude of a small part of the 
section of track Zi in which a transition to track Z2 has been chosen to take place are 
shown in more detail, i.e. with a larger scale and with the frequency information 
devolved onto a third orthogonal gr^hical axis, which makes it possible to consider 
visually the temporal occurrence of different fi^equency elements independently of 
each other with relative ease, while still retaining information on the timing between 
them. Fig. 3 shows three different frequency bands, viz low-frequency elements fu 
(e.g. bass lines), mid-frequency elements fm and high frequency elements fe. although 
many more may be defined in a practical system, similarly it should be noted that in 
practice the amplitude signature of a track is likely to be significanUy more complex, 
both in terms of the mixture of frequency components and the variations in intrinsic 
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amplitude of those components than has been illustrated here for purposes of 
explanation. 

Referring now to Fig. 4, the architecture of a system for analysing variations in 
5 intrinsic amplitude by sampling different frequency bands is illustrated .schematically. 
A digitised audio signal (whether generated intrinsically from a CD, or as a result of 
conversion from an analogue source) from track Z\ is sampled prior to mixing of the 
track by using the system of Fig. 4, and is passed through three parallel signal 
processing channels Chi (ft), Ch2 (fw), Ch3 (fn), each of which has a frequency pass- 

1 0 band filter: low pass filter 1 10, mid pass filter 1 12 and high pass filter 114 

respectively. The outputs of each of the filters 110-114 are sent to a peak detector 
120-124 respectively. The peak detectors are each reset p^odically by a master 
clock 130, whose period T is set by processor 70 to equal the beat of the track as 
determined (at least for the duration of the transition phase between tracks Z\ and Z2) 

15 by the time-stretching process described fully in our co-pending European application 
00303960.0. The peak detectors 120-124 thus periodically generate an output 
corresponding to the maximimi value of intrinsic amplitude Acn in the respective 
frequency range once per beat of the track Z\ . In addition, each of the peak detectors 
120-124 incorporates an auxiliary clock 140-144 respectively which is reset 

20 simultaneously with the peak detector by the master clock 130. The auxiliary clocks 
provide a time value tcn indicative of the instant in time over the course of a given 
cycle of the master clock 130 (and therefore the beat of the track) at which the peak 
intrinsic amplitude occurred. For a given frequency channel, this time value may well 
be the same each time, because the peak intrinsic amplitude in any given channel is 

25 likely to have a constant relationship in time with the beat of the track, which in tum 
is typically constant. However, as will be seen subsequently, it is useful in 
deteraiining relative timing of peaks in different channels. 

It is not essential to provide sampled outputs from the individual channels based on 
30 peak amplitude. For example, in an alternative configuration an integrating circuit 
may be used in conjunction with the master clock to provide a series of average 
amplitude values over the course of each clock cycle. 
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The sampled outputs from channels Chi , Ch2, Ch3 are stored in a designated memory 
MCI, MC2, MC3 respectively (typically provided by designated areas of RAM 80), 
in a series of what may be thought of as temporal intrinsic peak amplitude 
coordinates, i.e. comprising a digital intrinsic peak amplitude value, e.g. Aci 
(typically 1 6-24 bits long per audio channel) in conformity with curr«it JCD and DVD 
player standards) and a corresponding time value indicating the time elapsed since the 
start of the transition phase at which that peak intrinsic amplitude occurred. These 
three sets of coordinates may be represented in visual terms by three histograms, from 
which a rapid appreciation of the relative intrinsic amplitude and timing of the peaks 
can be obtained, and in Figs. 5 A and B the histograms for the sections of track Zi 
(represented by coordinates [Acn^, (NT + tcn'^)] and (represented by coordinates 
Bcn'^, NT + tcn*^) which are to be mixed during the transition are shown, where: Acn** 
and Bcn^ are the N* intrinsic peak amplitudes for tracks Zi and Zz from CSiannel Q, at 
a time Ntcn*^ after the start of the transition phase, N is an integer generated by a 
processor 200 which increases by a value of 1 for each clock cycle during the 
sampling, T is the time period equal to the beat of the track, and tcn^ is the time 
interval in the N* clock cycle preceding occurrence of the peak amplitude Acn** or 
Bcn^ as the case may be. Using the peak intrinsic amplitude coordinates from each of 
the channels Chl-Ch3, a determination is then made by processor 70 as to which 
fi^uency range is dominant for the pair of tracks Zi and Z2 over their mutual 
transition period. The dominant range will then be used to provide data necessary for 
equalising the net ou^ut volume over the transition phase between the tracks Zi and 
Z2. Determination of the dominant range may be made on the basis of one or more 
predetermined criteria, such as for example, the frequency range in which the average 
peak intrinsic ampUtude is highest over the duration of the transition period between 
tracks (i.e. the period over which sampling by the signal processing architecture 
illustrated in Fig. 4 occurred), or the frequency range in which the highest peak was 
obtained over the duration of the fransition period. In the present example the 
dominant frequency range is chosrai to be the one whose intrinsic peak amplitudes 
have been used to time-stretch and synchronise tracks Zj and Z2, which in this 
example is the low frequency range. 

Having generated intrinsic amplitude coordinates by sampling the transition section of 
each track, the coordinates from the dominant channel are then used to provide 
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equalisation of the net output volume. Sampled outputs of the two tracks Z\ and Zj 
from the dominant frequency channel which are to occur contemporaneously during 
the mix are summed together (remembering that the outputs in the low frequency 
range are synchronised as a result of time stretching and automatic synchronisation in 
accordance with our co-pending European application 00303960.0) to provide a series 
of summed contemporaneous values of peak intrinsic amplitude against time, i.e. 
summed contemporaneous peak amplitude coordinates (EAcn^Bcn^, NT + tcn^). 
These summed peak amplitude coordinates are illustrated schematically in the 
histogram of Fig. 6, from which it can be seen that the variation of summed peak 
amplitude with time is not constant over the course of the transition phase between 
tracks, similarly if both tracks are amplified at the same constant level of gain over the 
course of the transition phase, the net output voltmie from the speakers will 
correspond substantially to this variation, and will correspondingly not be constant. 
The net output volume may be equalised in many ways. Two simple ways in which 
this can be done is either to vary the amplification of one or both tracks during the 
transition phase to compensate for the variation of summed peak amplitude, or to 
adjust the intrinsic amplitude of one or both tracks so that the summed peak amplitude 
is constant over the transition phase. 

To adjust the amplification gain over the transition period, a profile of amplification 
level or gain with time is generated from the summed peak amplitude coordinates, and 
is then applied to the two tracks. The amplification profile is generated by taking the 
amplitude value from each summed peak amplitude coordinate, and comparing it to 
the relatively constant intrinsic amplitude prior to entering the transition phase (NB 
any differences in intrinsic "constant*' amplitude of the two tracks is normalised prior 
to mixing, either by an adjustment in amplification gain which is phased-in hnearly 
during the transition phase, or by a modification of the intrinsic amplitude of the 
incoming track, in this instance Z2). In the current example, the intrinsic amplitude of 
the channel Chi frequency band (or in a different example whichever other frequency 
band is determined as being dominant) prior to entering the transition phase is equal 
to a substantially constant value a, and the amplification gain q is at a constant value 
Q. However, at a time NT + 1 afl^ the start of the transition phase the sunmied peak 
amplitude EAcn^Bcn'^ has dropped below pt by an amount 8a , given by the 
expression (SAcn^Bcn^ — a) to the value (a + 5a ). Fig. 6B shows values of - 5a (i.e. 
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with inverted sign) against time (NB the convention being that 5a has a sign which is 
negative if SAc„Bcn is less than a). The gain at that point in time during the transition 
phase should be therefore be increased by 5a^ / (LAc„^ Bcn** - a) to a value Q[l - 5a 
SAcn'^ Bcn*^] in order that the net ou^ut volume is equalised to the pre-transition 
phase level. By comparing each of the summed peak amplitudes EAc„!!Bc„'* with the 
value a, a series of discrete modified amplification gain levels q, where: 

q = Q[I-6a'*/2Ac„NB^^Nj 

against time is generated, which in turn may be used to approximate a contmuous 
profile of ampKfication gain against time during the course of the transition phase 
(e.g. by fitting a curve to the discrete values) and this profile is shown in Fig, 6C. 

The amplification profile is then ^plied to the outputs of the two audio players 10, 20 
without discrimination as to firequency range (since the output of the players is not 
naturally split into frequency bands) over the duration of the transition phase. The 
gain levels specified by the amplification profile may be split between the ampUfiers 
30, 40 of the audio players 10, 20 in any ratio desired, provided that at any instant the 
net amplification gain applied to the two tracks Z,, Z2 (i.e. the linear sum of the gain 
^lied to trades individually) is equal to the amplification gain specified by the 
profile at that instant. In one embodiment the gain values will be split 50-50 between 
the two players, so that the fade-out and fade-in of the two tracks as a result of their 
intrinsic amplitude is repUcated in relative terms in the transition phase. 
Alternatively, the relative intrinsic peak amplitudes of the two tracks during the 
transition phase may be taken into account, in which case the gain is apportioned 
between the amplifiers 30, 40 so the fade-out and fade-in is substantially linear. 
Alternatively the amplification profile is applied to only one track. 

Although! reference has fi«quently been made to the use of digital audio players in 
conjunction with the method and apparatus of the present invention, it is not necessary 
to use such players for implementation of the invention. For example, amplification 
could be appUed to digital audio of tiie final mix (or near final mix), and used to 
produce a final mix audio file that is stored in memory. 
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Equalisation of the net outiiUt volume by modification of intrinsic amplitudes may 
also be perfomied using the sununed contemporaneous peak amplitude coordinates 
shown in Fig. 6A. Once again each summed peak amplitude EAcn^Bcn^ is compared 
with the pre-transition phase "constant level a, to generate a value 5a^ equal to the 
5 difference between them. As previously, each value 8a^ has a positive §ign if the 
summed peak amplitude SAcn'^ Bcn^ is larger than a, and a negative sign if smaller. 
In the present example each summed peak amplitude SAcn^ Bcn^ is smaller than a, 
and so each summed peak amplitude must be increased by (SAcn^ Bo,^ — 5a^) in 
order to make it equal to a. The total increase required in the summed peak 
10 amplitudes SAcn^ Bcn'^ for equalisation is then apportioned between the individual 
intrinsic peak amplitudes in proportion to their size, so the N'^ intrinsic peak 
amplitude value Acn^ will be increased by a value: 

15 

and the intrinsic peak amplitude value Bcn^ will be increased by a value 

Ab''= 6a^Bcn''/(Acn'^ + Ben'')] 

20 From these absolute values Aa^ and Ab^ of peak amplitude incrementation, a set of 
proportional reduction values Aa^/Aoi^, and Ab^/Bcii^ are easily calculable. These 
discrete proportional reduction values may then be used to approximate a continuous 
profile of proportional amplitude modification against time (for example by fitting a 
curve to the points as in the case of the curve of Fig. 6C), which may then in turn be 

25 used to modify each intrinsic ampUtude value (as opposed simply to the peak intrinsic 
amplitude values) of the respective track Z] or Z2 by an amount proportional to its 
amplitude. Once the intrinsic amplitudes of the tracks Zi or Z2 have been modified, 
the tracks may then be mixed simply by maintaining a constant amplification gain on 
each track throughout the duration of the mix, since equalisation of the net volume 

30 has been performed by the creation of the modified amplitude values. 

Physical modification of the intrinsic ampUtudes involves copying the transition 
section of each track Zi, Z2 to a RAM, and then modifying the copied version of the 
transition section which is stored in the RAM. This is feasible* since the maximum 

- - ~ - — > 
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frequency of a CD- quality digital audio signal is approximately 22KHz, and so is 
sampled at 44. 1 KHz in order to capture all the variations in amplitude (i.e. two 
"values" of amplitude per cycle). If the transition between the tracks lasts for ten 
seconds, then 0.88Mb of memory will be required for each track (digital audio usually 
operating on 16 bits rather than 8), meaning a total required RAM capacity ofless 
then 2Mb. 

In a fiirther embodiment of the present invention, equalisation is performed by 
considering each of the tracks separately. Referring now to Figs. 7A and 7B, standard 
fade-out and fade-in amplitude profiles are lines of equal gradient, but opposing sign. 
From Fig. 7C it can be readily seen that if a pair of tracks having such profiles are 
mixed together, with the amplification gain remaining constant during the transition 
phase, the net output volume will be constant. Thus it is possible using these profiles 
to pre-configure the introduction and play-out parts of a given track to the template so 
that it will mix with any other track similarly configured. The pre-configuration may 
be performed eiflier by adjustment of the amplification gain over the course of tiie 
transition phase, or modification of the intrinsic an^>litude, as described in each case 
above, so that the fade-out and fade-in sections of a given track correspond to the 
template profile. This embodiment has been described in connection with 
substantially linear profiles of amplitude variation with time. Other profiles which 
sum to provide equalisation may also be employed, and preferably the incoming and 
outgoing profiles will sum to provide constant or substantially constant output 
amplitude over the duration of the transition. 

In a further modification, a combination of amplification adjustment and modification 
to intrinsic amplitude may be employed, either to tailor two tracks together 
individually as described above, or to configure tracks to a template profile. 

In an alternative embodiment variations in net ou^ut volume are minimised by 
matching sampled fade-out and fade-in sections of two tracks in a variety of temporal 
juxt^ositions, i.e. different instances of starting to play the fade-in part of one track 
simultaneously with the fade-out part of another, and the temporal juxtaposition 
yielding the smallest variation in net output volume over the duration of the transition 
is adopted. While this embodiment may not necessarily provide full, or substantially 
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full equalisation, it nevertheless reduces net output volume variations in comparison 
to what they would otherwise be, and has the virtue of being simple and therefore 
quicker than the other embodunents. Referring now to Fig. 8A, the sampled peak 
amplitudes of the sections of tracks Z| and Z2 which are to be mixed are juxtaposed 
5 side by side, i.e. the last value of peak amplitude of Z| is adjacent the fi^st peak 

amplitude of Z2. With the tracks Zi, Z2 juxtaposed in such a manner, the processor 70 
then performs a comparison in respect of each peak amplitude, to generate a series of 
values |5a^|, where: 

10 |5a^| = la-SAcn^^Bcn'^r 

Thus ISa*^! is the absolute value of the difference between the sura of 
contemporaneous peak amplitude values, and the value a is established as the 
substantially constant amplitude prior to the transition phase, hi the example 
15 illustrated in Fig. 8 A there are no summed peak amplitude values, and so the 

expression £Acn*^ Bcn^ is simply equal to the individual peak amplitude in each case. 
An average ei of the values |5a^| is then obtained for the first juxtaposition. 

The two sets of peak amplitudes are then re-juxtaposed, with the first and last peak 
20 amplitudes of tracks Z2 and Z2 summed together as illustrated in Fig. SB, and a value 
62 is obtained for that juxtaposition, whereupon the peak amplitudes are re-juxtaposed 
by one, i.e. moving the peak amplitudes of track Z2 "back in time" by one peak 
amplitude, and a fiirther value 62 is obtained for that second juxtaposition. This 
process is repeated to obtain a value of e for each possible juxtaposition, i.e. through 
25 the juxtaposition illustrated in Fig. 8C until the juxtaposition of Fig. 8D is reached. 
This yields a series of values of Ei, £2, . ... Sj, each of which is representative of the 
variation in intrinsic amplitude (and therefore, for a given level of amplification gain, 
net output volume) for a particular juxtaposition. The juxtaposition with the most 
constant intrinsic amplitude will be therefore be the juxtaposition with the lowest 
30 value of e, which is thus selected for the transition, and the two tracks are then played 
in the selected juxtaposition at a constant level of ampUfication. 

A iiirth^ independent aspect of the present invention relates to a qualitative aspect of 
providing an appealing mix between two tracks. Referring again to Fig. 5, while the 
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beats of the tracks Z, and Z2 in the dominant frequency band ft sampled via channel 
Chi are synchronised for the transition between tracks (this process of 
synchronisation being performed in accordance with the disclosure of our co-pending 
European patent application 00303960.0), the other musical elements of the tracks 
5 occurring in other frequency bands are unlikely to be so. Thus, depencUpg upon the 
relative timing of events in these frequency bands, there may be a clash between 
tiiem, i.e. a combination of events in the same or a similar frequency channel which 
result in an unhealing mix. To ameliorate such a situation, events from the two 
tracks in the same or similar frequency bands are matched with each other, that is to 
10 say their relative timing and amplitude are compared, and one or more predetermined 
decision making criteria are applied to the compared events to determine whether a 
clash is present. 

Referring once again to Figs. 5A and 5B, each of the sampled peak amplitudes from 
1 5 each of the output channels Chl-3 have a temporal coordinate NT + tc„, where, as 
referenced above, N is the number of clock cycles (a single clock cycle being equal to 
the time period of a beat of the two tracks Z, and Z2 once time-stretched), and to.*^ is 
the time interval between the start of a clock cycle and the generation of the N* peak 
amplitude in channel n. It is therefore possible to determine the relative timing of two 

20 peak amplitudes in e.g. the high frequency channel Cai3 from tracks Z, and Z^, since 
each peak ampKtude output from each of fracks Zi and Z2 in channel Ch3 has a 
temporal coordinate related to the master clock cycle by the iteration integer N, and 
the time interval tc3^. Peak amplitudes from the non-dominant output channels 
having equivalent frequency bands are therefore compared from the point of view of 

25 relative timing and ampUtude in order to determine, on the basis of one or mote 
predetermined criteria, whether they are likely to cause a clash. The detemiinative 
criteria may be for example whether their amplitude are similar to within a 
predeteraiined value, and whether they occur within a predetermined time interval of 
each other. In the event that a clash is deemed likely, a number of remedial processes 

30 are possible. A first such process requires an amplifier for each of the tracks Z,, Zj 
which enables independent amplification levels for dififerent frequency bands, in 
which case the processor 70 operates to reduce the ampUfication level of the relevant 
output channel for one of the tracks; if desired the processor also operates to increase 
correspondingly the ampHfication level of the relevant output channel on the other to 
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compensate. Alternatively, a modification of the intrinsic amplitudes may be 
perfoTOied to reduce the amplitude levels for one of the tracks, and if desired to 
increase amplitudes on the other of the tracks. 

5 Preferably, in the event that this frequency blending technique is to be employed in a 
system also employing techniques to equalise net output volume, the volume 
equalisation processing is performed first, so that any effect this may have on the 
output volume of elements from a given non-dominant frequency band may be taken 
into account, both in determining whether a clash is likely to occur, and in modifying 
10 output volumes for musical elements in a particular frequency band. 

As mentioned previously in coimection with Fig. 3, the variation of intrinsic 
amplitude of a track is, in practice, likely to be significantly more complex than that 
shown for the purposes of explanation in Fig. 3. Two more realistic examples of 

15 variations in intrinsic amplitude are shown in Figs. 9A and B. One resuh of the 

significantly greater complexity which exists in practice is that sampling the tracks 
using chaimels having fixed and predetermined frequency bands is unlikely to provide 
optimum results for each track. For example the dominant bass line of a particular 
track, which is most frequently used both for time stretching and determining 

20 adjustments for equahsation of output amplitude, may have a frequency which 

straddles two of the predetermined fixed frequency bands, meaning that variations of 
amplitude at this frequency would be sampled partly in the low frequency channel and 
partly in the mid-frequency channel. To provide optimum equalisation in each case, a 
preferred embodiment of the present invention provides that following copying of a 

25 section of each of the tracks selected for mixing into RAM, the tracks are analysed to 
determine, from the variation in amplitude across the analysed spectrum of 
frequencies of both of the tracks an appropriate number and range of frequency bands. 
Thus the frequency and range of the bands, and therefore the number of them, may 
vary from one crossfade to another. Selection of bands is typically performed initially 

30 for an individual track, by considering the intrinsic amplitude over the time selected 
for mixing. For this time interval, a provisional frequency band is assigned for each 
peak ampUtude above a given value, and which is spaced by more than a 
predetermined fi^uency range from another such peak. This process is repeated for 
the second of the two tracks to be mixed, and the two sets of provisional designated 
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frequency bands (and the variations in amplitude within them) for the two tracks are 
then compared. From the comparison of the two provisional sets of bands, at least 
one common dominant frequency band, to be used for equalisation puiposes is 
defined, typically by selecting the two most individually dominant provisional 
5 frequency bands which lie within a predetemiined frequency range of eaph other, and 
then defining a common frequency band which encompasses the peak amplitudes of 
the two provisional bands. Further common frequOTcy bands may be defined for the 
purpose of preventing clashes if desired. 

1 0 Clashes may however be prevented widiout defining further frequency bands. For 
example, to provide the maps of Figs. 9A and B, the entire section of each track 
selected for the crossfade will have been copied into RAM. It is therefore possible 
simply to compare each peak amplitude of one track with nearby peak amplitudes of 
the other, and detemiine on the basis of each comparison, whether a clash is likely to 

15 occur between the two peaks; if one is, then one of the peaks is reduced until the clash 
is avoided. The criteria for determining the possibility of a clash are typicaUy as set 
out above: i.e. whether two peak amplitudes are similar to vdthin a predetermined 
amplitude value, whether they occur within a predetermined time interval of each 
other, and whether they occur within a predetermined frequency range of each other 
20 (this latter criterion being additional as a result of not considering peak amplitudes in 
frequency bands). 

Referring now to Fig. 10, a peak ampUtude P of the outgoing, and in this example 
dominant track is illustrated graphically. The peak amplitude P has an amplitude A, a 

25 frequency v, and occurs at tune x. A box whose geometric centre is at the coordinates 
(A, V, t), and whose dimensions are AA x Av x At, defines the zone witiiin 
amplitude/frequency/time space within which the occurrence of a peak amplitude 
from the incoming track would constitute a clash. A peak amplitude P' from the 
incoming track is illustrated in dotted lines. It can be seen that this peak lies within 

30 the box and therefore is likely, in accordance with the selected criteria, to cause a 
clash. The processor therefore reduces the amphtude of this peak until it no longer 
lies within the box to avoid a clash. This process is repeated for all peak amphtudes 
outside of the frequency band which is dominant (i.e. which has been used for 
equalisation), preferably after equalisation has been performed. The dominant track 
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is simply the track which is selected as the track in relation to which clashes will be 
defined, as opposed to the track whose peak amplitudes are to be suppressed. 

It is possible that the reduction in peak amplitude could take an amplitude from one 
5 box and into another, thus causing a further reduction in the peak amph.tJide, which 
could in theory result in an iterative reduction of some frequencies to negHgible (i.e. 
non audible) levels, it is necessary either to restrict the number of iterations of the 
process described above, or to stop the process once the non-dominant amplitudes 
have dropped below a predetermined level. 

10 

Analysis of the response of the human ear to different frequencies has shown that, 
over the range of audible frequencies, the ear is more responsive to some frequencies 
than others. Thus an audio signal having a constant output volume, whose frequency 
increases steadily to sweep through the spectmm of audible frequencies, will seem to 

IS a listener to be louder at some frequencies in the audible range than others (see for 
example "The Computer Music Tutorial, Curtis Roads, MIT Press 1998, pp. 1 049- 
1069). In a modification of the technique described above therefore, the sizes of the 
boxes in ampUtude-frequency-time space are weighted in accordance with the 
established response of the ear. That is to say that at frequencies which the ear is less 

20 responsive the boxes are smaller (i.e. a clash between two signals is considered likely 
only if they are extremely similar), and vice versa. 

The range of amplitudes, frequencies and the time intraval which define a clash 
between two peak amplitudes from different tracks have been defined above using 

25 Cartesian coordinates, and so boxes within firequency-amplitude-time space have 
naturally resulted. This is merely for convenience, and any boundary conditions for 
clashes deemed most appropriate may be defined. Thus for example it is perfectly 
feasible to define a range of frequencies within which a clash may occur, which range 
varies with variations in amplitude and time, resulting in e.g., a sphere in frequency- 

30 amplitude-time space which defines a clash. 

The methods described thus far have all related to analysis and processing of the 
audio data which occurs prior to playing. It is however possible to perform a degree 
of equalisation in real time. For example, using a simplified version of the apparatus 
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of Fig. 4 to sample the output amphtude of the audio sources (i.e. the amplitude after 
amplification), values of peak output amplitude for each track can be generated which 
can be compared to values of desired output amplitude firom a predetermined 
amplitude profile, such as the ones illustrated in Figs. 7A and B, and an instantaneous 
adjustment to the amplification of the track can be made on the basis of the 
comparison, in order to cause the output amplitude of each track to conform 
substantially to the predetennined profiles. 
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CLAIMS 

1 . A method for the automated mixing of at least two pieces of musical content 
comprising the steps of: 
5 selecting first and second sections of first and second tracks respectively, over 

which transition between playing the first and second tracks will be made; 

analysing intrinsic recorded amplitude of the first and second tracks over the 
first and second sections respectively; 

simultaneously playing the first and second sections of the first and second 

10 tracks; 

effecting transition from playing the first track to playing the second track by 
reducing output amplitude of the first track over duration of the first section and 
increasing output amplitude of the second track over duration of the second section; 
and 

1 S using analysis of the intrinsic ampUtude of at least one of the first and second 

tracks to equalise variations in net output volume lix>m the first and second tracks over 
the duration of the transition. 

2 Wherein equalisation provides a substantially constant net output amplitude over 
20 duration of the transition. 

3. A method according to claim 1 wherein equaUsation of the aforesaid variations 
includes the step of adjusting amplification of at least one of the tracks over duration 
of the transition. 

25 

4. A method according to claim 3 further comprising the step of generating a profile 
of amplification with time for at least one of the tracks over duration of the transition. 

5. A method according to claim 4 wherein the amplification profile is generated by 
30 pairing at least selected intrinsic amplitude values of at least one of the tracks with 

further amplitude values, and determining, for each paired intrinsic amphtude value, a 
value of amplification required to in order to generate from the selected intrinsic 
amplitude a target output ampUtude. 



A. ^\J^ «y> 
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6. A method according to claim 5 wherein the fiirther amplitude values form a 
template profile of predetemiined amplitude with time, and the target output 
amplitude corresponds to the contemporaneous amplitude of the template. 

7. A method according to claim 6 wherein the further ampUtude values are 
contemporaneous intrinsic amplitudes of the other of the first and second tracks, and 
the combined target output amplitude of the firat and second tracks has a 
predetermined constant value. 



8. A method according to claim 1 wherein equalisation of the aforesaid variations 
includes the step of creating modified intrinsic amplitude values for at least one of the 
tracks over the duration of the transition prior to amplification. 

9. A method according to claim 8 further comprising the steps of: generating a copy 
of at least the section of one of the tracks over which the aforesaid transition will be 
made; prior to amplification, modifying at least one intrinsic amplitude value from 
the copy; and amplifying the modified copy during the transition. 

1 0. A method according to claim 9 fiirther comprising the steps of: pairing at least 
selected intrinsic ampUtude values of the aforesaid copy with contemporaneous 
amplitude values firom a template profile of ampHtude with time; and modifying 
amplitude values from the copy so that they are equal to the contemporaneous values 
from the template amplitude profile. 

11. A method according to claim 9 finther comprising the steps of summing a 
plurality of contemporaneous intrinsic amplitude values of the first and second tracks 
over the duration of the transition, and normalising each set of contemporaneous 
summed ampUtudes so that the sum is equal to a predetemimed constant amplitude, 
by modifying the intrinsic amplitude values of the copy of at least one of the tracks. 

1 2. A method according to claim 1 1 wherein copies of both the first and second 
sections of the first and second tracks are generated, and nomialising each set of 
contemporaneous summed amplitudes is achieved by modifying the intrinsic recorded 
amplitude values of each of the copies of the first and second tracks. 
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13. A method according to claim 1 1 further comprising the step of providing a 
constant and predetermined level of amplification to both the first and second tracks 
over the duration of the transition from playing of the first track to playing of the 
second track. 

5 

14. A method according to claim 1 wherein equalisation of the aforesaid variations 
includes the st^ of adjusting amplification of at least one of the tracks over duration 
of the transition, and the step of creating modified intrinsic amplitude values for at 
least one of the tracks over duration of the transition. 

10 

15. A method according to claim 1 further comprising the steps of: generating an 
indication of variation in output volume for a plurality of temporal juxtapositions for 
simultaneous playing of the two tracks; selecting the temporal juxtaposition having 
the lowest indicated variation in output ampUtude, and efifecting transition from the 

1 5 first to the second track by playing the tracks in the selected temporal juxtaposition. 

16. A method according to claim 1 5 wherein the indication of output amplitude for a 
given temporal juxtaposition is generated by comparing a series of summed ampUtude 
values of the first and second tracks to a predetermined amplitude level. 

20 

17. A method according to claim 16 wherein the indication of output amplitude for a 
given temporal juxtaposition is an average of difference between each summed 
amplitude values and the predetermined amplitude level. 

25 1 8. A method according to claim 1 wherein the equalisation of the output amplitude 
takes place in real time. 

19. A method according to claim 1 8 wherein output amplitude of the first and 
second tracks is sampled, the sampled output amplitude is compared to a template 
30 profile of output amplitude with time, and amplification of one or both tracks is 
adjusted in accordance with a result of the comparison. 
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20. A method according to claim 1 further comprising the step, subsequent to 
analysing the intrinsic amplitude of the first and second tracks, of defining for the first 
and second tracks, at least one frequency band. 

21. A method according to claim 5 further comprising the step, subsequent to 
analysing the intrinsic amplitude of the first and second tracks, of defining for the first 
and second tracks, at least one firequency band, and wherein the intrinsic amplitude 
values are selected fix)m within the frequency band. 

22. A method according to claim 1 1 further comprising the step, subsequent to 
analysing the intrinsic amplitude of the first and second tracks, of defming for the first 
and second tracks, at least one frequency band, wherein the contemporaneous intrinsic 
amplitudes lie within the frequency band. 

23. Apparatus for equalising output amplitude of first and second sections of 
musical content intended for simultaneous ou^ut, the apparatus comprising: 

first and second audio data sources on which the first and second sections of the 
musical content are stored; 

first and second music players, adapted to generate first and second audio signals 
from the first and second audio data sources respectively; 

an amplifier for amplifying the first and second audio signals; 

an analyser, adapted to analyse at least first and second audio signals, and to 
determme an extent to which combined intrinsic amplitude of the first and second 
audio signals, resulting from sunultaneous playing of the signals, varies from a 
constant value. 

24. Apparatus according to claim 23 wherein the analyser is additionally adapted to 
generate, on the basis of the extent of variation of the combined intrinsic amplitude 
from the constant value, an amplification profile for each of the first and second audio 
signals, in accordance with which the first and second audio signals will be amplified, 
thereby to equaUse output amplitude during simultaneous playing of the first and 
second audio signals. 
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25. Apparatus according to claim 24, wherein the analyser includes a memory in 
which copies of the first and second audio data is stored prior to playing. 

26. Apparatus according to claim 25 wherein the analyser is adapted to modify, prior 
5 to playing, intrinsic amplitudes of at least one of the first and second audio data stored 

in the memory, in accordance with the extent of variation of the combined intrinsic 
amplitude fi-om the constant value. 

27. Apparatus according to claim 23, wherein the analyser is adapted to generate an 
1 0 error profile indicating a variation from the constant value of the combined intrinsic 

amplitude with time, and the error profile is used by the analyser to generate an 
amplification profile for each of the first and second audio signals, in accordance with 
which the first and second audio signals will be amplified, thereby to equalise output 
amplitude during simultaneous playing of the first and second audio signals. 

15 

28. Apparatus according to claim 23 wherein the analyser is adapted to generate an 
error profile indicating a variation from the constant value of the combined intrinsic 
amplitude with time, and the error profile is used by the analyser to modify, prior to 
playing, intrinsic amplitudes of at least one of the first and second audio data stored in 

20 the memory, in accordance with the extent of variation of the combined intrinsic 
amplitude firom the constant value. 

29. Apparatus according to claim 23 wherein the analyser is adapted to sample the 
first and second audio signals during playing, to compare an instantaneous output 

25 amplitude of each signal with a predetermined profile of output amplitude variation 
with time, and to alter amplification of the first and/or second audio signals in 
accordance with the results of each comparison. 
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